feat: Add Ollama and local model support by FrenzyVJN · Pull Request #26 · huggingface/upskill

FrenzyVJN · 2026-02-09T06:42:46Z

Add Ollama and Local Model Support

Summary

This PR adds comprehensive support for using Ollama and other local LLM providers (like LM Studio) with upskill's generate and eval commands. Users can now run skill generation and evaluation entirely locally without requiring API keys from cloud providers.

Motivation

Currently, upskill requires API keys for cloud providers (Anthropic, OpenAI, etc.) to function. This PR enables:

Privacy: Run entirely locally without sending data to external APIs
Cost savings: Use free local models instead of paid API calls
Offline usage: Work without internet connectivity
Model flexibility: Test with any model supported by Ollama (llama3.2, qwen2.5-coder, etc.)

Changes

New Command-Line Flags

Both generate and eval commands now support:

--base-url <url>: Custom API endpoint for local models (e.g., http://localhost:11434/v1)
--provider <name>: API provider (optional, auto-detected as generic when --base-url is provided)

Implementation Details

Model Factory Monkey Patch
- Patches FastAgent's ModelFactory.parse_model_string() to handle unknown model names
- Falls back to generic provider when GENERIC_BASE_URL environment variable is set
- Catches ModelConfigError and returns ModelConfig(provider=Provider.GENERIC, model_name=model_string)
Environment Variable Configuration
- Sets GENERIC_BASE_URL to point to local API endpoint
- Sets GENERIC_API_KEY to "local" (required but unused by Ollama)
- Sets ANTHROPIC_API_KEY to "dummy" to bypass startup checks
Model String Formatting
- For non-generic providers: prepends provider prefix (e.g., anthropic.claude-3-5-sonnet)
- For generic provider: passes model name as-is (e.g., llama3.2:latest)
- Monkey patch handles unknown models automatically
Small Model Improvements
- Updated generation prompt to be more explicit and direct
- Added code fence stripping for models that wrap output in markdown blocks
- This helps smaller models generate valid SKILL.md format
Skills Directory Support
- FastAgent now loads skills from ./skills/ directory if it exists
- Added {{agentSkills}} placeholder to skill_gen agent card

Usage Examples

Generate Skill with Ollama

# Generate skill without evaluation
upskill generate "parse YAML files" \
  --model llama3.2:latest \
  --base-url http://localhost:11434/v1 \
  --no-eval \
  -o ./my-skill

# Generate with explicit provider
upskill generate "document code" \
  --model qwen2.5-coder:7b \
  --provider generic \
  --base-url http://localhost:11434/v1 \
  --no-eval

Evaluate Skill with Local Model

# Evaluate with manual test cases
upskill eval ./skills/my-skill \
  --model qwen2.5-coder:7b \
  --base-url http://localhost:11434/v1 \
  --tests tests.json

# Verbose output
upskill eval ./skills/my-skill \
  --model llama3.2:latest \
  --base-url http://localhost:11434/v1 \
  --tests tests.json \
  -v

Testing

Tested with:

✅ llama3.2:latest - Basic generation and evaluation works
✅ qwen2.5-coder:7b - Better results, 25-75% success rates on simple tasks
✅ Skills loading from ./skills/ directory
✅ Code fence stripping for wrapped outputs
✅ Generate command with --no-eval
✅ Eval command with manual test cases (--tests)

Example Test Results

Hello World Skill with qwen2.5-coder:7b:

Baseline: 50% success (2/4 tests)
With Skill: 75% success (3/4 tests)
Improvement: +25%
Recommendation: Keep skill ✅

Known Limitations

Test Case Generation: The --eval-model flag with automatic test generation may not work due to FastAgent's structured() method not properly respecting model overrides.

Workaround: Use --no-eval flag during generation, then create test cases manually and run eval separately:

# Generate without eval
upskill generate "task" --model llama3.2 --base-url http://localhost:11434/v1 --no-eval

# Create tests.json manually

# Evaluate with manual tests
upskill eval ./skill --model llama3.2 --base-url http://localhost:11434/v1 --tests tests.json

Small Model Quality: Models like llama3.2 (3B parameters) may produce lower quality skills compared to larger cloud models. Recommended to use 7B+ models like qwen2.5-coder:7b for better results.

Backwards Compatibility

All changes are fully backwards compatible:

Existing commands work exactly as before
New flags are optional
Default behavior unchanged
No breaking changes to API or configuration

Files Changed

src/upskill/cli.py (+148 lines): Added Ollama support, new flags, environment setup
src/upskill/generate.py (+36 lines): Code fence stripping, improved prompts, RequestParams support
src/upskill/agent_cards/skill_gen.md (+2 lines): Added {{agentSkills}} placeholder

Future Improvements

Potential follow-ups (not in this PR):

Document recommended model sizes for different tasks
Add model capability detection
Improve test case generation for local models
Add progress indicators for long-running generations
Support for other local providers (Llama.cpp, vLLM, etc.)

Checklist

Code follows project style guidelines
All changes are backwards compatible
Tested with multiple local models (llama3.2, qwen2.5-coder)
Documentation included in commit message
No breaking changes
Error handling for missing Ollama server
Works with both generate and eval commands

…luation This commit adds comprehensive support for using Ollama and other local LLM providers (like LM Studio) with upskill's generate and eval commands. ## Changes ### Core Features - Add --base-url and --provider flags to both generate and eval commands - Monkey patch FastAgent's ModelFactory to handle unknown model names when GENERIC_BASE_URL is set - Auto-detect 'generic' provider when --base-url is provided - Set dummy API keys to bypass authentication checks when using local models ### Generation Improvements - Update prompt to be more explicit for smaller models - Add code fence stripping for models that wrap output in markdown blocks - Pass model parameter through RequestParams to all FastAgent calls - Support model override for all generation functions (generate_skill, generate_tests, improve_skill, refine_skill) ### Evaluation Improvements - Add environment variable configuration for eval command - Format model strings correctly for generic provider - Support loading skills from ./skills/ directory ### Bug Fixes - Fix classmethod monkey patch to properly access __func__ - Fix model formatting logic for eval_model parameter - Add {{agentSkills}} placeholder to skill_gen agent card to enable skill loading ## Usage Examples Generate skill with Ollama: upskill generate "parse YAML" --model llama3.2:latest \ --base-url http://localhost:11434/v1 --no-eval Evaluate skill with local model: upskill eval ./skills/my-skill --model qwen2.5-coder:7b \ --base-url http://localhost:11434/v1 --tests tests.json ## Technical Details The implementation uses FastAgent's generic provider support with environment variables: - GENERIC_BASE_URL: Points to local API endpoint (e.g., http://localhost:11434/v1) - GENERIC_API_KEY: Set to "local" (required but unused by Ollama) - ANTHROPIC_API_KEY: Set to "dummy" to bypass startup checks The monkey patch catches ModelConfigError for unknown models and falls back to generic provider when GENERIC_BASE_URL is configured. ## Limitations Test case generation with --eval-model in generate command may not work due to FastAgent's structured() method not properly respecting model overrides. Workaround: use --no-eval and provide test cases manually with --tests flag.

evalstate · 2026-02-16T12:36:01Z

Thanks for the patch -- for these models generic.qwen2.5-coder:7b should work out of the box?

FrenzyVJN · 2026-02-16T13:19:07Z

Good question — I tested this flow to confirm the behavior.

Currently, generic.qwen2.5-coder:7b does not work without additional configuration, because the generic provider still needs a base_url to know which API endpoint to call. Without it, FastAgent falls back to the default provider and errors (e.g., missing Anthropic key).

There is also a secondary issue: when using the generic provider, the full model string (generic.qwen2.5-coder:7b) is forwarded to Ollama, but Ollama expects just qwen2.5-coder:7b.

The --base-url approach resolves both problems by:

explicitly selecting the generic-compatible endpoint
avoiding unintended provider fallback
keeping the CLI usage predictable for local model setups

That said, we could improve this further by stripping the generic. prefix before sending the request. This would make the behavior more forgiving and align better with user expectations.

Happy to implement that if you think it's the right direction — otherwise I'm comfortable keeping the current explicit configuration to avoid hidden magic.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add Ollama and local model support#26

feat: Add Ollama and local model support#26
FrenzyVJN wants to merge 1 commit intohuggingface:mainfrom
FrenzyVJN:main

FrenzyVJN commented Feb 9, 2026

Uh oh!

evalstate commented Feb 16, 2026

Uh oh!

FrenzyVJN commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FrenzyVJN commented Feb 9, 2026

Add Ollama and Local Model Support

Summary

Motivation

Changes

New Command-Line Flags

Implementation Details

Usage Examples

Generate Skill with Ollama

Evaluate Skill with Local Model

Testing

Example Test Results

Known Limitations

Backwards Compatibility

Files Changed

Future Improvements

Checklist

Uh oh!

evalstate commented Feb 16, 2026

Uh oh!

FrenzyVJN commented Feb 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants